In the following notebook, we’ll apply some of the GIS concepts and methods from the last seminar, and we’ll fit the urban scaling relationship of Hungarian “kocsma” data! First, let’s load the necessary libraries.
library(sf)
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)
This is going to be our working Cartesian coordinate system. This code corresponds to the so-called EOV system, that is the most commonly used Cartesian coordinate system for Hungary.
project_crs <- sf::st_crs("+init=epsg:23700")
The following file has been downloaded from Geofabrik, it contains all the OpenStreetMap settlement center data from Hungary. We load the dataset, then transform it to our desired coordinate system. We filter the dataset to leave out villages and Bratislava, that somehow happened to land in a Hungarian dataset. We can see the settlement classes in the fclass attribute of the dataset.
settlements <- sf::read_sf('shapes/gis_osm_places_free_1.shp') %>% st_transform(project_crs)
head(settlements)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 486269.1 ymin: 84168.97 xmax: 580720.5 ymax: 162998.1
## CRS: +init=epsg:23700
## # A tibble: 6 x 6
## osm_id code fclass population name geometry
## <chr> <int> <chr> <dbl> <chr> <POINT [m]>
## 1 15350070 1003 village 666 Som (580720.5 162998.1)
## 2 15370125 1003 village 486 Somogyapáti (549658.6 84168.97)
## 3 15370132 1003 village 232 Patosfa (543251.5 87992.11)
## 4 15731237 1002 town 4301 Tab (572921 155039)
## 5 15733040 1003 village 832 Sormás (486269.1 126249)
## 6 16769958 1003 village 504 Hetvehely (572459.9 88004.91)
unique(settlements$fclass)
## [1] "village" "town" "city" "hamlet"
## [5] "suburb" "national_capital" "farm" "locality"
## [9] "island" "region" "county"
plot(settlements["fclass"])
Let’s filter the above data to cities. We actually get back the 326 cities in Hungary (compared to Wikipedia data, we could have chosen a more accurate validation…)
cities <- settlements %>% filter(fclass == "town" | fclass == "city" | fclass == "national_capital" & name != "Bratislava")
plot(cities["fclass"])
head(cities)
## Simple feature collection with 6 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 471380.6 ymin: 56223.58 xmax: 604029.7 ymax: 155039
## CRS: +init=epsg:23700
## # A tibble: 6 x 6
## osm_id code fclass population name geometry
## <chr> <int> <chr> <dbl> <chr> <POINT [m]>
## 1 15731237 1002 town 4301 Tab (572921 155039)
## 2 17502356 1002 town 2406 Villány (604029.7 58512.92)
## 3 17550787 1001 city 145347 Pécs (586608.3 81633.75)
## 4 18331125 1002 town 23388 Komló (589313.3 94373.22)
## 5 20938029 1002 town 4303 Harkány (586957.1 56223.58)
## 6 21371958 1002 town 4060 Letenye (471380.6 123631.1)
The next file is still an OSM dump, it contains amenity coordinates. Each record contains a so-called fclass field, that corresponds to the type of the amenity.
places <- sf::read_sf('shapes/gis_osm_pois_free_1.shp') %>% st_transform(project_crs)
head(places)
## Simple feature collection with 6 features and 4 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 478968.1 ymin: 118009.4 xmax: 766210.3 ymax: 297601.9
## CRS: +init=epsg:23700
## # A tibble: 6 x 5
## osm_id code fclass name geometry
## <chr> <int> <chr> <chr> <POINT [m]>
## 1 25606346 2701 tourist_info Trojmedzie SK-A-H (509238.4 297601.9)
## 2 26748406 2907 camera_surveillance <NA> (717655.9 265810.3)
## 3 26861240 2422 camp_site Bad Gunaras (582695 118009.4)
## 4 26861244 2422 camp_site Castrum (702134.3 206533)
## 5 26861251 2422 camp_site Füzes (766210.3 238484.7)
## 6 26861254 2422 camp_site Gyógycamping (478968.1 228487.3)
sort(unique(places$fclass))
## [1] "alpine_hut" "archaeological" "arts_centre"
## [4] "artwork" "atm" "attraction"
## [7] "bakery" "bank" "bar"
## [10] "battlefield" "beauty_shop" "bench"
## [13] "beverages" "bicycle_rental" "bicycle_shop"
## [16] "biergarten" "bookshop" "butcher"
## [19] "cafe" "camera_surveillance" "camp_site"
## [22] "car_dealership" "car_rental" "car_sharing"
## [25] "car_wash" "caravan_site" "castle"
## [28] "chalet" "chemist" "cinema"
## [31] "clothes" "college" "comms_tower"
## [34] "community_centre" "computer_shop" "convenience"
## [37] "courthouse" "dentist" "department_store"
## [40] "doctors" "dog_park" "doityourself"
## [43] "drinking_water" "embassy" "fast_food"
## [46] "fire_station" "florist" "food_court"
## [49] "fountain" "furniture_shop" "garden_centre"
## [52] "gift_shop" "golf_course" "graveyard"
## [55] "greengrocer" "guesthouse" "hairdresser"
## [58] "hospital" "hostel" "hotel"
## [61] "hunting_stand" "ice_rink" "jeweller"
## [64] "kindergarten" "kiosk" "laundry"
## [67] "library" "lighthouse" "mall"
## [70] "memorial" "mobile_phone_shop" "monument"
## [73] "motel" "museum" "newsagent"
## [76] "nightclub" "nursing_home" "observation_tower"
## [79] "optician" "outdoor_shop" "park"
## [82] "pharmacy" "picnic_site" "pitch"
## [85] "playground" "police" "post_box"
## [88] "post_office" "prison" "pub"
## [91] "recycling" "recycling_clothes" "recycling_glass"
## [94] "recycling_metal" "recycling_paper" "restaurant"
## [97] "ruins" "school" "shelter"
## [100] "shoe_shop" "sports_centre" "sports_shop"
## [103] "stadium" "stationery" "supermarket"
## [106] "swimming_pool" "telephone" "theatre"
## [109] "theme_park" "toilet" "tourist_info"
## [112] "tower" "town_hall" "toy_shop"
## [115] "track" "travel_agent" "university"
## [118] "vending_any" "vending_machine" "vending_parking"
## [121] "veterinary" "video_shop" "viewpoint"
## [124] "waste_basket" "wastewater_plant" "water_mill"
## [127] "water_tower" "water_well" "water_works"
## [130] "wayside_cross" "wayside_shrine" "windmill"
## [133] "zoo"
interesting <- c("atm","bank","bar","cafe","pharmacy","supermarket","convenience","pub", "playground")
places <- places %>% filter(fclass %in% interesting)
head(places %>% as.data.frame %>%group_by(fclass) %>% summarise(total = n()))
## # A tibble: 6 x 2
## fclass total
## <chr> <int>
## 1 atm 1484
## 2 bank 1375
## 3 bar 548
## 4 cafe 1971
## 5 convenience 6623
## 6 pharmacy 1635
Where are all the pubs?
plot(places %>% filter(fclass=="pub") %>% select(code))
How do we sort the amenities into cities? For that, here’s an administrative boundary file from https://data2.openstreetmap.hu/hatarok/index.php?admin=8. This is also OSM data, but it was somehow missing from the previous dump.
settlement_shapes <- sf::read_sf('shapes/admin8.shp') %>% st_transform(project_crs)
plot(settlement_shapes["ADMIN_LEVE"])
We first join the city centers with population data to the administrative boundaries.
indexmap <- settlement_shapes %>% sf::st_contains(cities) %>% as.data.frame %>% set_colnames(c("id1","id2"))
cities_w_pop <- cbind(settlement_shapes[indexmap$id1,],cities[indexmap$id2,])%>% select(-geometry.1)
cities_w_pop$log_pop <- log10(cities_w_pop$population)
plot(cities_w_pop["log_pop"])
Then we can ask whether the OSM amenities fall within an administrative boundary. We create a dataframe called scaling_data from some interesting variables.
indexmap <- cities_w_pop %>% st_contains(places) %>% as.data.frame %>% set_colnames(c("id1","id2"))
cities_w_places <- cbind(cities_w_pop[indexmap$id1,],places[indexmap$id2,]) %>% select(-geometry.1)
scaling_data <- cities_w_places %>% as.data.frame %>% group_by(name,fclass.1,population) %>% count() %>% pivot_wider(names_from = fclass.1, values_from = n)
Now we calculate some helper variables, and we merge some of the classes like pub and bar to kocsma.
scaling_data$log10_pop = log10(scaling_data$population)
scaling_data$log_pop = log(scaling_data$population)
scaling_data$bolt = scaling_data$supermarket %>% replace_na(0) + scaling_data$convenience %>% replace_na(0)
scaling_data$kocsma = scaling_data$bar %>% replace_na(0) + scaling_data$pub %>% replace_na(0)
And here are our first urban scaling plots.
ggplot(scaling_data) + geom_point(aes(x=population,y=kocsma)) + scale_x_log10() + scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
ggplot(scaling_data) + geom_point(aes(x=population,y=bolt)) + scale_x_log10() + scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
Let’s fit kocsma scaling with 3 different methods: OLS, NLS and MLE.
col = "kocsma"
newcol = paste0(col,"_log")
scaling_data[newcol] = log(scaling_data[col])
fit_data <- scaling_data %>% filter(kocsma>0 & population>0)
ols <- lm(kocsma_log ~ log_pop, data = fit_data)
ols
##
## Call:
## lm(formula = kocsma_log ~ log_pop, data = fit_data)
##
## Coefficients:
## (Intercept) log_pop
## -6.2374 0.8613
nonlin_ls <- nls(kocsma~a*exp(b*log(population)),start=list(a=1e-1,b=1),data=fit_data)
nonlin_ls
## Nonlinear regression model
## model: kocsma ~ a * exp(b * log(population))
## data: fit_data
## a b
## 0.0008666 0.9633598
## residual sum-of-squares: 16066
##
## Number of iterations to convergence: 5
## Achieved convergence tolerance: 6.068e-08
Plotting our results:
fit_data$kocsma_predict_ols = exp(predict(ols,newdata = fit_data))
fit_data$kocsma_predict_nls = predict(nonlin_ls,newdata=fit_data)
ggplot(fit_data) +
geom_point(aes(x=population,y=kocsma,color="data")) +
geom_line(aes(x=population,y=kocsma_predict_ols,color="OLS")) +
geom_line(aes(x=population,y=kocsma_predict_nls,color="NLS")) +
scale_x_log10() +
scale_y_log10() +
scale_color_discrete(name ="Methods")
Creating a maximum likelihood function according to the person model of Leitao et al. (2016), plotting the LL as a function of the parameter, selecting maximum LL parameter, plotting all three fitting methods together.
x=fit_data$population
y=fit_data$kocsma
ytot = sum(y)
Z <- function(beta){
return(sum(x**beta))
}
LL <- function(beta){
R =y*log(x**beta/Z(beta))
return(sum(R[!is.infinite(R)]))
}
t=seq(from = 0.8, to = 1.2, by = 0.001)
lt <- lapply(t,LL)
plot(t,lt)
We choose the maximum likelihood exponent to be our final beta for this method.
final_beta <- t[which.max(lt)]
print(final_beta)
## [1] 0.954
fit_data$kocsma_predict_mle = ytot/Z(final_beta)*fit_data$population**final_beta
ggplot(fit_data) +
geom_point(aes(x=population,y=kocsma,color="data")) +
geom_line(aes(x=population,y=kocsma_predict_ols,color="OLS")) +
geom_line(aes(x=population,y=kocsma_predict_nls,color='NLS')) +
geom_line(aes(x=population,y=kocsma_predict_mle,color='MLE')) +
scale_x_log10() +
scale_y_log10() +
scale_color_discrete(name ="Methods")
Are SAMIs different than per-capita indicators?
fit_data$name[order(-log(fit_data$kocsma/fit_data$kocsma_predict_mle))][0:20]
## [1] "Balatonföldvár" "Villány" "Badacsonytomaj" "Visegrád"
## [5] "Nagymaros" "Körösladány" "Mélykút" "Berettyóújfalu"
## [9] "Zamárdi" "Gyomaendrőd" "Kerekegyháza" "Barcs"
## [13] "Sümeg" "Velence" "Herend" "Lajosmizse"
## [17] "Pécsvárad" "Békéscsaba" "Bük" "Tokaj"
fit_data$name[order(-fit_data$kocsma/fit_data$population)][0:20]
## [1] "Balatonföldvár" "Villány" "Badacsonytomaj" "Visegrád"
## [5] "Nagymaros" "Körösladány" "Mélykút" "Zamárdi"
## [9] "Berettyóújfalu" "Kerekegyháza" "Gyomaendrőd" "Sümeg"
## [13] "Herend" "Velence" "Barcs" "Pécsvárad"
## [17] "Bük" "Lajosmizse" "Tokaj" "Balatonalmádi"
Is this model more likely than a linear one? At least we can be sure it is.
delta_bic = -2*LL(1)+2*LL(final_beta)
delta_bic
## [1] 34.26633